1 research outputs found
Handwritten and printed text separation in historical documents
Historical documents present many challenges for Optical Character Recognition Systems
(OCR), especially documents of poor quality containing handwritten annotations,
stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed
or handwritten texts, printed and handwritten parts have to be separated before using
the respective recognition system. This thesis addresses the problem of segmentation of
handwritings and printings in historical Latin text documents. To alleviate the problem
of lack of data containing handwritten and machine-printed components located on the
same page or even overlapping each other as well as their pixel-wise annotations, the data
synthesis method proposed in [12] was applied and new datasets were generated. The
newly created images and their pixel-level labels were used to train Fully Convolutional
Model (FCN) introduced in [5]. The newly trained model has shown better results in the
separation of machine-printed and handwritten text in historical documents